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Abstract 

Nowadays we are often faced with huge databases resulting from the rapid growth of data 
storage technologies. This is particularly true when dealing with music databases. In this 
context, it is essential to have techniques and tools able to discriminate properties from these 
massive sets. In this work, we report on a statistical analysis of more than ten thousand 
songs aiming to obtain a complexity hierarchy. Our approach is based on the estimation of 
the permutation entropy combined with an intensive complexity measure, building up the 
complexity-entropy causality plane. The results obtained indicate that this representation 
space is very promising to discriminate songs as well as to allow a relative quantitative com- 
parison among songs. Additionally, we believe that the here-reported method may be applied 
in practical situations since it is simple, robust and has a fast numerical implementation. 
Keywords: permutation entropy, music, complexity measure, time series analysis 



1. Introduction 

Nowadays we are experimenting a rapid development of technologies related to data 
storage. As an immediate consequence, we are often faced with huge databases hindering the 
access to information. Thus, it is necessary to have techniques and tools able to discriminate 
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elements from these massive databases. Text categorization [T], scene classification [2] and 
protein classification p] are just a few examples where this problem emerges. In a parallel 
direction, statistical physicists are increasingly interested in studying the so-called complex 
systems [IHH]. These investigations employ established methods of statistical mechanics 
as well as recent developments of this field aiming to extract hidden patterns that are 
governing the system's dynamics. In a similar way, this framework may help to advance 
in distinguishing elements within these databases, with the benefit of the simplicity often 
attributed to statistical physics methods. 

A very interesting case corresponds to the music databases, not only because of the 
incredible amount of data (for instance, the iTunes Store has more than 14 million songs), 
but also due to the ubiquity of music in our society as well as its deeply connection with 
cognitive habits and historical developments |[9j. In this direction, there are investigations 
focused on collective listening habits p^HT2] . collaboration networks among artists [13], 
music sales |ll], success of musicians p^HTT] . among others. On the other hand, the sounds 
that compose the songs present several complex structures and emergent features which, in 
some cases, resemble very closely the patterns of out-of-equilibrium physics, such as scale-free 
statistics and universality. For instance, the seminal work of Voss and Clarke [18] showed 
that the power spectrum associated to the loudness variations and pitch fluctuations of 
radio stations (including songs and human voice) is characterized by 1// noise-like pattern 
in the low frequency domain (/ < lOHz). Klimontovich and Boon [19] argue that this 
behavior for low-frequency follows from a natural flicker noise theory. However, this finding 
has been questioned by Nettheim [20] and according to him the power spectrum may be 
better described by l/p. Fractal structures were also reported by Hsii and Hsii [211 122] 
when studying classical pieces concerning frequency intervals. It was also found that the 
distribution of sound amplitudes may be adjusted by a one-parameter stretched Gaussian 
and that this non-Gaussian feature is related to correlation aspects present in the songs |23j . 

These features and others have attracted the attention of statistical physicists, who have 

attempted to obtain some quantifiers able to distinguish songs and genres. One of these 

efforts was made by Jennings et al. [21] who found that the Hurst exponent estimated 
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from the volatility of the sound intensity depends on the music genre. Correa et al. [2S] 
investigated four music genres employing a complex network representation for rhythmic 
features of the songs. There are still other investigations [261439] . most of which are based 
on fractal dimensions, entropies, power spectrum analysis or correlation analysis. It is 
worth noting that there are several methods of automatic genre classification emerging 
from engineering disciplines (see, for instance, Ref. |1Q]). In particular, there exists a very 
active community working on music classification problems and several important results 
are published at the ISMIR conferences (just to mention a few please see Refs. |12HS1]-) 

However, the music genre it not a well defined concept [31], and, specially, the boundaries 
between genres still remain fuzzy. Thus, any taxonomy may be controversial, representing 
a challenging and open problem of pattern recognition. In addition, some of the proposed 
quantifiers require specific algorithms or recipes for processing the sound of the songs, which 
may depend on tuning parameters. 

Here, we follow an Information Theory approach trying to quantify aspects of songs. 
More specifically, the Bandt and Pompe approach [33] is applied in order to obtain a 
complexity hierarchy for songs. This method defines a "natural" complexity measure for 
time series based on ordinal patterns. Although this concept has not been explored yet 
within the context of music, it has been successfully applied in other areas, such as medi- 
cal [SSI EZ] , financial [SSI El] and chmatological time series [HDl EI] • In this direction, our 
main goal is to fill this hiatus employing the Bandt and Pompe approach together with a 
non-trivial entropic measure [62] - [M] . constructing the so-called complexity-entropy causal- 
ity plane [SEl EHl E51 ES] • As it will be discussed in detail below, we have found that this 
representation space is very promising to distinguish songs from huge databases. Moreover, 
thanks to the simple and fast implementation it is possible to conjecture its use in practi- 
cal situations. In the following, we review some aspects related to the Bandt and Pompe 
approach as well as the complexity-entropy causality plane (Section 2). Next, we describe 
our database and the results (Section 3). Finally, we end this work with some concluding 
comments (Section 4). 
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2. Methods 



The essence of the permutation entropy proposed by Bandt and Pompe [55] is to asso- 
ciate a symbohc sequence to the time series under analysis. This is done by employing a 
suitable partition based on ordinal patterns obtained by comparing neighboring values of 
the original series. To be more specific, consider a given time series {xt}t=i,...,N and the 
following partitions represented by a (i- dimensional vector (rf > 1, D G N) 

with s = d,d + l, . . . ,N. For each one of these (A^ — d + 1) vectors, we investigate the permu- 
tations 71 = (ro, ri, . . . , rrf_i) of (0, 1, . . . , (i — 1) defined by Xs-r^.i < ^ ' ' ' ^ Xs-n < 
Xg-roj and, for all d\ possible permutations of vr, we evaluate the probability distribution 
P = {p(7r)} given by 

/ N i^{s\s < N — d + 1; (s) has type vr} 
^^""^ " N-d + 1 ' 

where the symbol # stands for the number (frequency) of occurrences of the permutation 
vr. Thus, we define the normalized permutation entropy of order by 

Hs[P] = ^ , (1) 
log ct ! 

with S[P] being the standard Shannon's entropy |67]. Naturally, < Hs[P] < 1, where the 
upper bound occurs for a completely random system, i.e., a system for which all d \ possible 
permutations are equiprobable. If the time series exhibits some kind of ordering dynamics 
Hs[P] will be smaller than one. As pointed out by Bandt and Pompe [55], the advantages in 
using this method lie on its simplicity, robustness and very fast computational evaluation. 
Clearly, the parameter d (known as embedding dimension) plays an important role in the 
estimation of the permutation probability distribution P, since it determines the number of 
accessible states. In fact, the choice of d depends on the length of the time series in such 
a way that the condition d\ <^ N must be satisfied in order to obtain a reliable statistics. 
For practical purposes, Bandt and Pompe recommend d = 3, ... ,7. Here, we have fixed 
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d = 5 because the time series under analysis are large enough (they have more than one 
million of data values). We have verified that the results are robust concerning the choice 
of the embedding dimension d. 

Advancing with this brief revision, we now introduce another statistical complexity mea- 
sure able to quantify the degree of physical structure present in a time series [S2H5^ . Given a 
probability distribution P, this quantifier is defined by the product of the normalized entropy 
Hs, and a suitable metric distance between P and the uniform distribution Pg = {l/d\}. 
Mathematically, we may write 

C,s[P]=QAP,Pe]Hs[P], (2) 

where 

max 

and Qmax is the maximum possible value of Qj [P, Pe] , obtained when one of the components 
of P is equal to one and all the others vanish, i.e., 

1 



Qmax 2 



' + ^ log(rf ! + 1) - 2 log(2rf !) + log(d !) 



d] 

The quantity Qj, usually known as disequilibrium, will be different from zero if there are 
more likely states among the accessible ones. It is worth noting that the complexity measure 
Cjs is not a trivial function of the entropy [62] because it depends on two different prob- 
ability distributions, the one associated to the system under analysis, P, and the uniform 
distribution, Pg- It quantifies the existence of correlational structures, providing important 
additional information that may not be carried only by the permutation entropy. Further- 
more, it was shown that for a given Hg value, there exists a range of possible Cjs values [68] . 
Motivated by the previous discussion. Rosso et al. [65] proposed to employ a diagram of 
Cjs versus Hs for distinguishing between stochasticity and chaoticity. This representation 
space, called complexity-entropy causality plane [5HI EHl [65] , herein will be our approach for 
distinguishing songs. 

The concept of ordinal patterns can be straightforward generalized for non-consecutive 
samples, introducing a lag of r (usually known as embedding delay) sampling times. With 



r = 1 the consecutive case is recovered, and the analysis focuses on the highest frequency 
contained within the time series. It is clear that different time scales are taken into account by 
changing the embedding delays of the symbolic reconstruction. The importance of selecting 
an appropriate embedding delay in the estimation of the permutation quantifiers has been 
recently confirmed for different purposes, like identifying intrinsic time scales of delayed 
systems [691 EO] , quantifying the degree of unpredictability of the high-dimensional chaotic 
fluctuations of a semiconductor laser subject to optical feedback [71j, and classifying cardiac 
biosignals [72]. We have found that an embedding delay r = 1 is the optimal one for our 
music categorization goal since when this parameter is increased the permutation entropy 
increases and the permutation statistical complexity decreases. Thus, the range of variation 
of both quantifiers is smaller and, consequently, it is more difficult to distinguish songs and 
genres. 

3. Data Presentation and Results 

It is clear that a music piece can be naturally considered as the time evolution of an 
acoustic signal and time irreversibility is inherent to musical expression |2ni El]. From 
the physical point of view, the songs may be considered as pressure fluctuations traveling 
through the air. These waves are perceived by the auditory system leading the sense of 
hearing. In the case of recordings, these fluctuations are converted into a voltage signal by 
a record system and then stored, for instance, in a compact disc (CD). The perception of 
sound is usually limited to a certain range of frequencies - for human beings the full audible 
range is approximately between 20 Hz and 20 kHz. Because of this limitation the record 
systems often employ a sampling rate of 44.1 kHz encompassing all the previous spectrum. 
All the songs analyzed here have this sampling rate. 

Our database consists of 10124 songs distributed into ten different music genres, they 
are: blues (1020), classical (997), flamenco (679), hiphop (1000), jazz (700), metal (1638), 
Brazilian popular music - mpb (580), pop (1000), tango (1016) and techno (1494). The 
songs were chosen aiming to cover a large number of composers and singers. To achieve this 



6 



Rory Gallagher - Loanshark Blues 



Rory Gallagher - Loanshark Blues 





Tchaikovsky - Hungarian Dance 



Tchaikovsky - Hungarian Dance 




Motorhead - Bomber 



1 .0 1 .5 

t (minutes) 

Motorhead - Bomber 





t (minutes) 
DJ Splash - Ring Dinge Ding 



t (minutes) 
DJ Splash - Ring Dinge Ding 





Figure 1: A graphical representation of 4 songs from 4 different genres. In the left panel we show the 
amplitude series and in the right panel the intensity series. The music genres are blues, classic, metal and 
techno, respectively. 

and also to determine the music genre via an external judgment, we tried to select CDs that 
are compilations of a given genre or from representative musical groups of a given genre. 

By using the previous database, we focus our analysis on two times series directly ob- 
tained from the digitized files that represent each song - the sound amplitude series and 
the sound intensity series, i.e., the square of the amplitude. Figure [T] shows these two time 
series for several songs. We evaluate the normalized entropy Hg and the statistical complex- 
ity measure Cjg for the amplitude and intensity series associated to each song as shown in 
Figs. and|2]D. Notice that both series, amplitude and intensity, lead to similar behavior, 
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contrarily to what happens with other quantifiers. For instance, when deahng with Hurst 
exponent is preferable to work with the intensities [23] or volatihties [2l|, since the amph- 
tudes are intrinsically anti-correlated due the oscillatory nature of the sound. Moreover, we 
have found that there is a large range of Hg and Cjs possible values. This wide variation 
allows a relative comparison among songs and someone may ask to listen songs that are 
limited within some interval of Hs and/or Cjs values. We also evaluate the mean values of 
Cjs and Hs over all songs grouped by genre as shown by Figs. ^ and|2ji. These mean values 
enable us to quantify the complexity of each music genre. In particular, we can observe that 
high art music genres (e.g. classic, jazz and tango) are located in the central part of the 
complexity plane, being equally distant from the fully aleatory limit {Hs — ?■ 1 and Cjs — J- 0) 
and also from the completely regular case {Hs — and Cjs — > 0). On the other hand, 
light/dance music genres (e.g. pop and techno) are located closer to the fully aleatory limit 
(white noise). In this context, our approach agrees with other works [231 [211 128] . 

Therefore, we have verified that the ordinal pattern distribution that exists among the 
sound amplitudes values and also among the sound intensity is capable to spread out our 
database songs though the complexity-entropy causality plane. It is interesting to remark 
that the embedding dimension employed here {d = 5) corresponds to approximately 10"'^ 
seconds. Thus, it is surprising how this very short time dynamics retains so much information 
about the songs. We also investigated shuffled version of each song series aiming to verify if 
the localization of the songs in the complexity-entropy causality plane is directly related to 
the presence of correlations in the music time series. This analysis is shown in Figs. [3] and |4] 
for each song and for all genres. We have obtained Hs ^ 1 and Cjs ~ for all shuffled series, 
confirming that correlations inherently present in the original songs are the main source for 
the different locations in this plane. 

Although our approach is not focused on determining which music genre is related to 

a particular given song, this novel physical method may help to understand the complex 

situation that emerges in the problem of automatic genre classification. For instance, we can 

take a glance on the fuzzy boundaries existent in the music genre definitions, by evaluating 

the distribution of Hg and Cjs values. Figure [5] shows these distributions for both time 
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Figure 2: (color online) Complexity-entropy causality plane, i.e., Cjs versus Hs for all the songs when 
considering the (a) amplitude series and (b) the intensity series. In (c) and (d), we show the mean value of 
Cjs and Hg for each genre. The upper (bottom) dashed line represents the maximum (minimum) value of 
Cjs as a function of Hs for d — b and the different symbols refer to the 10 different genres. For a better 
visualization of the different genres see also Figs. |3]and|4l 

series employed here. There are several overlapping regions among the distributions of 
Hs and Cjs for the different genres. This overlapping is an illustration on how fuzzy the 
boundaries between genres and, consequently, the own concept of music genre can be. It 
is also interesting to observe that some genres have more localized PDFs, for instance, the 
techno genre is practically bounded to the interval (0.85, 0.95) of Hs values for the intensity 
series while the flamenco or mpb genres have a wider distribution. To go beyond the previous 
analysis, we try to quantify the efliciency of permutation indexes Hs and Cjs in a practical 
scenery of automatic genre classiflcation. In order to do this, we use an implementation |73] 
of a support vector machine (SVM) [73] where we have considered the values of Hs and Cjs 
for the amplitudes and intensity series as features of the SVM. We run the analysis for each 
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Figure 3: Complexity-entropy causality plane for the amplitude series by music genres when considering the 
original and shuffled series. The upper (bottom) dashed line represents the maximum (minimum) value of 
Cjs as a function of Hs for d — ^ and the arrows are indicating the shuffled analysis. 
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Figure 4: Complexity-entropy causality plane for the intensity series by music genres when considering the 
original and shuffled series. The upper (bottom) dashed line represents the maximum (minimum) value of 
Cjs as a function of Hg for d = 5 and the arrows are indicating the shuffled analysis. 
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Table 1: Percentage of correct choices of the SVM for each genre 



Genre 


Accuracy 


Genre 


Accuracy 


Blues 


87.87% 


Metal 


89.89% 


Classic 


92.03% 


MPB 


97.15% 


Flamenco 


95.12% 


Pop 


88.11% 


Hiphop 


88.11% 


Tango 


87.87% 


Jazz 


91.68% 


Techno 


87.14% 



genre training the SVM with 90% of dataset and performing an automatic detection over 
the remaining 10%. It is a simplified version of the SVM, where the system have to make a 
binary choice, i.e., to choose between a given genre and all the others. The accuracy rates 
of automatic detection are shown in Table [TJ Note that the accuracy values are around 90% 
within this simplified implementation, however we have to remark that in a multiple choice 
system these values should be much smaller. On the other hand, this analysis indicates that 
the entropic indexes employed here may be used in practical situations. 

4. Summary and Conclusions 

Summing up, in this work we applied the permutation entropy [55j, iJ^, and an intensive 
statistical complexity measure [62146^ . Cjs-, to differentiate songs. Specifically, we analyzed 
the location of the songs in the complexity-entropy causality plane. This permutation infor- 
mation theory approach enabled us to quantitatively classify songs in a kind of complexity 
hierarchy. 

We believe that the findings presented here may be applied in practical situations as well 
as in technological applications related to the distinction of songs in massive databases. In 
this aspect, the Bandt and Pompe approach has some advantageous technical features, such 
as its simplicity, robustness, and principally a very fast numerical evaluation. 
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Figure 5: (color online) Probability distribution functions (PDF) for the values of (a) Hg and (b) Cjs when 
considering the amplitude series grouped by music genre. Figs, (c) and (d) show the same PDFs for the 
intensity series. 
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